We use data from “The Demand for, and Impact of, Learning HIV Status” study in Malawi. The study uses a randomized controlled trial (RCT) design, where individuals received varying degrees of monetary incentives to learn about their HIV status after undergoing an HIV Test.
Study: Thornton, Rebecca L. 2008. “The Demand for, and Impact of, Learning HIV Status.” American Economic Review, 98 (5): 1829-63.
Data file: Click here
Detailed description of the intervention: Click here
For the analysis, we use the “Thornton HIV Testing Data.dta” file.
Import the data
Execution in R
The data file is a Stata (.dta) file. To import the dataset in R, we will need to install the haven package in R and use the read_dta() function. Run the following code in R to install the haven package:
install.packages("haven")
The downloaded files come with a readme document, which gives a detailed description of the variables used in the study.
Execution in Stata
Use the cd command to import the dataset.
"C://Data analysis"
cd use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
. cd "C://Data analysis"
C:\Data analysis
. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
Create treatment variable
We create a variable called treatment, which takes on a value of 1 if the participant received any financial incentive, and otherwise takes on a value of 0. The variable tinc records the amount of monetary incentive received by the respondents. We label the values of 0 and 1 as control and treatment.
Execution in R
Execution in Stata
"C://Data analysis"
cd use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
. cd "C://Data analysis"
C:\Data analysis
. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
. generate treatment = cond(tinc>0, 1, 0)
. label define treatment 0 "Control" 1 "Treatment"
. label val treatment treatment
Calculating the compliance rate
In this analysis, we try to study the effect of learning one’s HIV status on the decision to purchase a condom. We focus on a sub-group of individuals who are sexually active and HIV positive. To do this, we need to restrict our sample to the sexually active and HIV positive individuals and calculate the compliance rate for this sub-group.
Baseline data was collected in 2004 and follow up data was collected in 2005.
Variable description:
treatment: takes on the value of 1 if individual received monetary incentive and 0 otherwise.
hadsex12: Indicator if reported having sex in the pas 12 months from baseline (1 = Yes, 0 = No).
hiv2004: HIV results (1 = HIV Positive, 0 = HIV Negative, -1 = Indeterminent)
got: Indicator if obtained HIV results (1 = learned HIV results)
anycond: Indicator of any condom purchased at the follow-up survey
Execution in R
data_1 <- data_1 |>
filter( hadsex12 == 1, # restrict the sample size to hadsex 12 & hiv 2004
hiv2004 == 1,
!is.na(got), # remove NAs
!is.na(anycond)) # remove NAs
# create variable to calculate the share of people in the control and treatment group
data_1 <- data_1 |>
mutate(followed_treatment = ifelse(treatment == "Treatment", got, 1-got))
# tabulate followed_treatment given treatment == 1
trt_dat <- data_1 |>
filter(treatment == "Treatment") |>
select(followed_treatment) |>
group_by(followed_treatment) |>
summarize(Count = n()) |>
mutate(Percent = Count/sum(Count))
print(trt_dat)
# A tibble: 2 × 3
followed_treatment Count Percent
<dbl> <int> <dbl>
1 0 12 0.286
2 1 30 0.714
# tabulate followed_treatment given treatment == 0
cntrl_dat <- data_1 |>
filter(treatment == "Control") |>
select(followed_treatment) |>
group_by(followed_treatment) |>
summarize(Count = n()) |>
mutate(Percent = Count/sum(Count))
print(cntrl_dat)
# A tibble: 2 × 3
followed_treatment Count Percent
<dbl> <int> <dbl>
1 0 3 0.3
2 1 7 0.7
# calculate the compliance rate
compliance_rate <- 71.4-30.0
print(compliance_rate)
[1] 41.4
Execution in Stata
"C://Data analysis"
cd use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
generate followed_treatment = cond(treatment == 1, got, 1-got)
tab followed_treatment if treatment == 0
tab followed_treatment if treatment == 1
"Compliance rate =" 71.4 - 30 dis
. cd "C://Data analysis"
C:\Data analysis
. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
. generate treatment = cond(tinc>0, 1, 0)
. label define treatment 0 "Control" 1 "Treatment"
. label val treatment treatment
. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)
. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)
. generate followed_treatment = cond(treatment == 1, got, 1-got)
. tab followed_treatment if treatment == 0
followed_tr |
eatment | Freq. Percent Cum.
------------+-----------------------------------
0 | 3 30.00 30.00
1 | 7 70.00 100.00
------------+-----------------------------------
Total | 10 100.00
. tab followed_treatment if treatment == 1
followed_tr |
eatment | Freq. Percent Cum.
------------+-----------------------------------
0 | 12 28.57 28.57
1 | 30 71.43 100.00
------------+-----------------------------------
Total | 42 100.00
. dis "Compliance rate =" 71.4 - 30
Compliance rate =41.4
Here, 71.4% of the treatment group learned about their HIV status and 30% of the control group did so. The compliance rate is the difference between the share of treated individuals in the treatment group (71.43%) and the share of treated individuals in the control group (30%). Hence, the compliance rate for the experiment is 41.4% (71.43% - 30%).
Calculating the Intent to Treat Effect and the Local Average Treatment Effect (LATE) estimate
In R, we use the lm_robust() function from the estimatr package to run a regression with robust standard errors. In Stata, we use the regress command with the robust option for the same.
Execution in R
Call:
lm_robust(formula = anycond ~ treatment, data = data_1, se_type = "HC1")
Standard error type: HC1
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.2000 0.1290 1.550 0.1273 -0.05910 0.4591 50
treatmentTreatment 0.2286 0.1507 1.517 0.1356 -0.07408 0.5312 50
Multiple R-squared: 0.03429 , Adjusted R-squared: 0.01497
F-statistic: 2.301 on 1 and 50 DF, p-value: 0.1356
Execution in Stata
"C://Data analysis"
cd use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
regress anycond treatment, robust
. cd "C://Data analysis"
C:\Data analysis
. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
. generate treatment = cond(tinc>0, 1, 0)
. label define treatment 0 "Control" 1 "Treatment"
. label val treatment treatment
. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)
. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)
. regress anycond treatment, robust
Linear regression Number of obs = 52
F(1, 50) = 2.30
Prob > F = 0.1356
R-squared = 0.0343
Root MSE = .48756
------------------------------------------------------------------------------
| Robust
anycond | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treatment | .2285714 .1506789 1.52 0.136 -.0740761 .531219
_cons | .2 .1289961 1.55 0.127 -.0590963 .4590963
------------------------------------------------------------------------------
The estimates show that 20% of the sexually active HIV-positive individuals, who did not receive any monetary incentive to learn about their HIV status, still purchased condoms. In contrast, individuals who received a monetary incentive to learn about their HIV status were 22.86% more likely to purchase condoms. This is the intent to treat effect. Even though the monetary incentive provided to learn about one’s HIV status increased the willingness to buy condoms, it is not statistically significant.
Next, we use the results of the regression of anycond on treatment and got on treatment to calculate the Local Average Treatment Effect (LATE).
LATE = Intent to Treat / Compliance rate
Execution in R
Call:
lm_robust(formula = got ~ treatment, data = data_1, se_type = "HC1")
Standard error type: HC1
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.3000 0.1478 2.030 0.04769 0.003168 0.5968 50
treatmentTreatment 0.4143 0.1640 2.526 0.01474 0.084898 0.7437 50
Multiple R-squared: 0.115 , Adjusted R-squared: 0.09727
F-statistic: 6.382 on 1 and 50 DF, p-value: 0.01474
LATE <- 0.2285714/0.4142957
print(LATE)
[1] 0.5517108
Execution in Stata
"C://Data analysis"
cd use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
regress got treatment, robust
"LATE =" 0.2285714/0.4142857 dis
. cd "C://Data analysis"
C:\Data analysis
. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
. generate treatment = cond(tinc>0, 1, 0)
. label define treatment 0 "Control" 1 "Treatment"
. label val treatment treatment
. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)
. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)
. regress got treatment, robust
Linear regression Number of obs = 52
F(1, 50) = 6.38
Prob > F = 0.0147
R-squared = 0.1150
Root MSE = .46198
------------------------------------------------------------------------------
| Robust
got | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treatment | .4142857 .1639922 2.53 0.015 .0848976 .7436738
_cons | .3 .1477836 2.03 0.048 .0031679 .5968321
------------------------------------------------------------------------------
. dis "LATE =" 0.2285714/0.4142857
LATE =.55172409
The coefficient of treatment variable in this regression is equal to compliance rate that we calculated earlier. Amongst sexually active and HIV-positive respondents, we estimate that learning one’s HIV status increases the likelihood of purchasing condoms by about 55.17%. However, when we calculate the LATE estimate this way, we will not get the standard errors and we will not know if it is statistically significant. An alternative is to use the 2 SLS method to calculate the LATE effect.
Execution in R
Call:
iv_robust(formula = anycond ~ got | treatment, data = data_1,
se_type = "HC1")
Standard error type: HC1
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.03448 0.1561 0.2208 0.82612 -0.279152 0.3481 50
got 0.55172 0.2729 2.0219 0.04855 0.003643 1.0998 50
Multiple R-squared: 0.1776 , Adjusted R-squared: 0.1612
F-statistic: 4.088 on 1 and 50 DF, p-value: 0.04855
Execution in Stata
"C://Data analysis"
cd use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
keep if hadsex12 == 1 & hiv2004 == 1
drop if missing(tinc) | missing(got) | missing(anycond)
robust ivregress 2sls anycond (got = treatment),
. cd "C://Data analysis"
C:\Data analysis
. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear
. generate treatment = cond(tinc>0, 1, 0)
. label define treatment 0 "Control" 1 "Treatment"
. label val treatment treatment
. keep if hadsex12 == 1 & hiv2004 == 1
(4,698 observations deleted)
. drop if missing(tinc) | missing(got) | missing(anycond)
(70 observations deleted)
. ivregress 2sls anycond (got = treatment), robust
Instrumental variables (2SLS) regression Number of obs = 52
Wald chi2(1) = 4.25
Prob > chi2 = 0.0392
R-squared = 0.1776
Root MSE = .44118
------------------------------------------------------------------------------
| Robust
anycond | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
got | .5517241 .2675736 2.06 0.039 .0272896 1.076159
_cons | .0344828 .1531167 0.23 0.822 -.2656204 .3345859
------------------------------------------------------------------------------
Instrumented: got
Instruments: treatment
The coefficient of got in the regression and the calculated value for the LATE estimate are the same. The 2SLS regression estimates a p-value of 0.039 for got. Therefore, we can conclude that learning about one’s HIV-positive status increases the likelihood of purchasing condoms by a statistically significant margin.